14 research outputs found

    GREAT: open source software for statistical machine translation

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10590-011-9097-6[EN] In this article, the first public release of GREAT as an open-source, statistical machine translation (SMT) software toolkit is described. GREAT is based on a bilingual language modelling approach for SMT, which is so far implemented for n-gram models based on the framework of stochastic finite-state transducers. The use of finite-state models is motivated by their simplicity, their versatility, and the fact that they present a lower computational cost, if compared with other more expressive models. Moreover, if translation is assumed to be a subsequential process, finite-state models are enough for modelling the existing relations between a source and a target language. GREAT includes some characteristics usually present in state-of-the-art SMT, such as phrase-based translation models or a log-linear framework for local features. Experimental results on a well-known corpus such as Europarl are reported in order to validate this software. A competitive translation quality is achieved, yet using both a lower number of model parameters and a lower response time than the widely-used, state-of-the-art SMT system Moses. © 2011 Springer Science+Business Media B.V.Study was supported by the EC (FEDER, FSE), the Spanish government (MICINN, MITyC, “Plan E”, under Grants MIPRCV “Consolider Ingenio 2010”, iTrans2 TIN2009-14511, and erudito.com TSI-020110-2009-439), and the Generalitat Valenciana (Grant Prometeo/2009/014).González Mollá, J.; Casacuberta Nolla, F. (2011). GREAT: open source software for statistical machine translation. Machine Translation. 25(2):145-160. https://doi.org/10.1007/s10590-011-9097-6S145160252Amengual JC, Benedí JM, Casacuberta F, Castaño MA, Castellanos A, Jiménez VM, Llorens D, Marzal A, Pastor M, Prat F, Vidal E, Vilar JM (2000) The EUTRANS-I speech translation system. Mach Transl 15(1-2): 75–103Andrés-Ferrer J, Juan-Císcar A, Casacuberta F (2008) Statistical estimation of rational transducers applied to machine translation. Appl Artif Intell 22(1–2): 4–22Bangalore S, Riccardi G (2002) Stochastic finite-state models for spoken language machine translation. Mach Transl 17(3): 165–184Berstel J (1979) Transductions and context-free languages. B.G. Teubner, Stuttgart, GermanyCasacuberta F, Vidal E (2004) Machine translation with inferred stochastic finite-state transducers. Comput Linguist 30(2): 205–225Casacuberta F, Vidal E (2007) Learning finite-state models for machine translation. Mach Learn 66(1): 69–91Foster G, Kuhn R, Johnson H (2006) Phrasetable smoothing for statistical machine translation. In: Proceedings of the 11th Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, pp 53–61González J (2009) Aprendizaje de transductores estocásticos de estados finitos y su aplicación en traducción automática. PhD thesis, Universitat Politècnica de València. Advisor: Casacuberta FGonzález J, Casacuberta F (2009) GREAT: a finite-state machine translation toolkit implementing a grammatical inference approach for transducer inference (GIATI). In: Proceedings of the EACL Workshop on Computational Linguistic Aspects of Grammatical Inference, Athens, Greece, pp 24–32Kanthak S, Vilar D, Matusov E, Zens R, Ney H (2005) Novel reordering approaches in phrase-based statistical machine translation. In: Proceedings of the ACL Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Ann Arbor, MI, pp 167–174Karttunen L (2001) Applications of finite-state transducers in natural language processing. In: Proceedings of the 5th Conference on Implementation and Application of Automata, London, UK, pp 34–46Kneser R, Ney H (1995) Improved backing-off for n-gram language modeling. In: Proceedings of the 20th IEEE International Conference on Acoustic, Speech and Signal Processing, Detroit, MI, pp 181–184Knight K, Al-Onaizan Y (1998) Translation with finite-state devices. In: Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas, Langhorne, PA, pp 421–437Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 9th Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, pp 388–395Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the 10th Machine Translation Summit, Phuket, Thailand, pp 79–86Koehn P (2010) Statistical machine translation. Cambridge University Press, Cambridge, UKKoehn P, Hoang H (2007) Factored translation models. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp 868–876Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp 177–180Kumar S, Deng Y, Byrne W (2006) A weighted finite state transducer translation template model for statistical machine translation. Nat Lang Eng 12(1): 35–75Li Z, Callison-Burch C, Dyer C, Ganitkevitch J, Khudanpur S, Schwartz L, Thornton WNG, Weese J, Zaidan OF (2009) Joshua: an open source toolkit for parsing-based machine translation. In: Procee- dings of the ACL Workshop on Statistical Machine Translation, Morristown, NJ, pp 135–139Llorens D, Vilar JM, Casacuberta F (2002) Finite state language models smoothed using n-grams. Int J Pattern Recognit Artif Intell 16(3): 275–289Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the 7th Conference on Empirical Methods in Natural Language Processing, Morristown, NJ, pp 133–139Mariño JB, Banchs RE, Crego JM, de Gispert A, Lambert P, Fonollosa JAR, Costa-jussà MR (2006) N-gram-based machine translation. Comput Linguist 32(4): 527–549Medvedev YT (1964) On the class of events representable in a finite automaton. In: Moore EF (eds) Sequential machines selected papers. Addison Wesley, Reading, MAMohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1): 69–88Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51Ortiz D, García-Varea I, Casacuberta F (2005) Thot: a toolkit to train phrase-based statistical translation models. In: Proceedings of the 10th Machine Translation Summit, Phuket, Thailand, pp 141–148Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318Pérez A, Torres MI, Casacuberta F (2008) Joining linguistic and statistical methods for Spanish-to-Basque speech translation. Speech Commun 50: 1021–1033Picó D, Casacuberta F (2001) Some statistical-estimation methods for stochastic finite-state transducers. Mach Learn 44: 121–142Rosenfeld R (1996) A maximum entropy approach to adaptive statistical language modeling. Comput Speech Lang 10: 187–228Simard M, Plamondon P (1998) Bilingual sentence alignment: balancing robustness and accuracy. Mach Transl 13(1): 59–80Singh AK, Husain S (2007) Exploring translation similarities for building a better sentence aligner. In: Proceedings of the 3rd Indian International Conference on Artificial Intelligence, Pune, India, pp 1852–1863Steinbiss V, Tran BH, Ney H (1994) Improvements in beam search. In: Proceedings of the 3rd International Conference on Spoken Language Processing, Yokohama, Japan, pp 2143–2146Torres MI, Varona A (2001) k-TSS language models in speech recognition systems. Comput Speech Lang 15(2): 127–149Vidal E (1997) Finite-state speech-to-speech translation. In: Proceedings of the 22nd IEEE International Conference on Acoustic, Speech and Signal Processing, Munich, Germany, pp 111–114Vidal E, Thollard F, de la Higuera C, Casacuberta F, Carrasco RC (2005) Probabilistic finite-state machines–Part II. IEEE Trans Pattern Anal Mach Intell 27(7): 1025–1039Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2): 260–26

    The O3N2 and N2 abundance indicators revisited: improved calibrations based on CALIFA and T e-based literature data

    Full text link
    Astronomy and Astrophysics 559 (2013): A114 reproduced with permission from Astronomy and AstrophysicsThe use of integral field spectroscopy is since recently allowing to measure the emission line fluxes of an increasingly large number of star-forming galaxies, both locally and at high redshift. Many studies have used these fluxes to derive the gas-phase metallicity of the galaxies by applying the so-called strong-line methods. However, the metallicity indicators that these datasets use were empirically calibrated using few direct abundance data points (Te-based measurements). Furthermore, a precise determination of the prediction intervals of these indicators is commonly lacking in these calibrations. Such limitations might lead to systematic errors in determining the gas-phase metallicity, especially at high redshift, which might have a strong impact on our understanding of the chemical evolution of the Universe. The main goal of this study is to review the most widely used empirical oxygen calibrations, O3N2 and N2, by using newdirect abundance measurements. We pay special attention to (1) the expected uncertainty of these calibrations as a function of the index value or abundance derived and (2) the presence of possible systematic offsets. This is possible thanks to the analysis of the most ambitious compilation of Te-based H ii regions to date. This new dataset compiles the Te-based abundances of 603 H ii regions extracted from the literature but also includes new measurements from the CALIFA survey. Besides providing new and improved empirical calibrations for the gas abundance, we also present a comparison between our revisited calibrations with a total of 3423 additional CALIFA H ii complexes with abundances derived using the ONS calibration from the literature. The combined analysis of T e-based and ONS abundances allows us to derive their most accurate calibration to date for both the O3N2 and N2 single-ratio indicators, in terms of all statistical significance, quality, and coverage of the parameters space. In particular, we infer that these indicators show shallower abundance dependencies and statistically significant offsets compared to others'. The O3N2 and N2 indicators can be empirically applied to derive oxygen abundances calibrations from either direct abundance determinations with random errors of 0.18 and 0.16, respectively, or from indirect ones (but based on a large amount of data), reaching an average precision of 0.08 and 0.09 dex (random) and 0.02 and 0.08 dex (systematic; compared to the direct estimations), respectivelyR.A. Marino is funded by the Spanish program of International Campus of Excellence Moncloa (CEI). D. Mast thank the Plan Nacional de Investigación y Desarrollo funding programs, AYA2012-31935 of the Spanish Ministerio de Economía y Competitividad, for the support given to this project. S.F.S thanks the the Ramón y Cajal project RyC-2011-07590 of the spanish Ministerio de Economía y Competitividad, for the support giving to this project. F.F.R.O. acknowledges the Mexican National Council for Science and Technology (CONACYT) for financial support under the program Estancias Postdoctorales y Sabáticas al Extranjero para la Consolidación de Grupos de Investigación, 2010-2012. We acknowledge financial support for the ESTALLIDOS collaboration by the Spanish Ministerio de Ciencia e Innovación under grant AYA2010- 21887-C04-03. BG-L also acknowledges support from the Spanish Ministerio de Economía y Competitividad (MINECO) under grant AYA2012- 39408-C02-02. J.F.-B. acknowledges financial support from the Ramón y Cajal Program and grant AYA2010-21322-C03-02 from the Spanish Ministry of Economy and Competitiveness (MINECO), as well as to the DAGAL network from the People’s Program (Marie Curie Actions) of the European Union’s Seventh Framework Program FP7/2007-2013/ under REA grant agreement number PITN-GA-2011-289313. CK has been funded by project AYA2010-21887 from the Spanish PNAYA. P.P. acknowledges support by the Fundação para a Ciência e a Tecnologia (FCT) under project FCOMP-01-0124-FEDER-029170 (Reference FCT PTDC/FIS-AST/3214/2012), funded by FCT-MEC (PIDDAC) and FEDER (COMPETE). R.M.G.D. and R.G.B. also acknowledge support from the Spanish Ministerio de Economía y Competitividad (MINECO) under grant AyA2010-15081. V.S., L.G., and A.M.M. acknowledge financial support from the Fundação para a Ciência e a Tecnologia (FCT) under program Ciência 2008 and the research grant PTDC/CTE-AST/112582/200

    Comprehensive cross-platform comparison of methods for non-invasive EGFR mutation testing : results of the RING observational trial.

    Get PDF
    Abstract Several platforms for noninvasive EGFR testing are currently used in the clinical setting with sensitivities ranging from 30% to 100%. Prospective studies evaluating agreement and sources for discordant results remain lacking. Herein, seven methodologies including two next-generation sequencing (NGS)-based methods, three high-sensitivity PCR-based platforms, and two FDA-approved methods were compared using 72 plasma samples, from EGFR-mutant non-small-cell lung cancer (NSCLC) patients progressing on a first-line tyrosine kinase inhibitor (TKI). NGS platforms as well as high-sensitivity PCR-based methodologies showed excellent agreement for EGFR-sensitizing mutations (K = 0.80-0.89) and substantial agreement for T790M testing (K = 0.77 and 0.68, respectively). Mutant allele frequencies (MAFs) obtained by different quantitative methods showed an excellent reproducibility (intraclass correlation coefficients 0.86-0.98). Among other technical factors, discordant calls mostly occurred at mutant allele frequencies (MAFs) ≤ 0.5%. Agreement significantly improved when discarding samples with MAF ≤ 0.5%. EGFR mutations were detected at significantly lower MAFs in patients with brain metastases, suggesting that these patients risk for a false-positive result. Our results support the use of liquid biopsies for noninvasive EGFR testing and highlight the need to systematically report MAFs. Keywords: NGS; circulating free DNA; epidermal growth factor receptor; non-small-cell lung cancer; osimertinib; tyrosine kinase inhibitor

    Ahora / Ara

    Get PDF
    La cinquena edició del microrelatari per l’eradicació de la violència contra les dones de l’Institut Universitari d’Estudis Feministes i de Gènere «Purificación Escribano» de la Universitat Jaume I vol ser una declaració d’esperança. Aquest és el moment en el qual les dones (i els homes) hem de fer un pas endavant i eliminar la violència sistèmica contra les dones. Ara és el moment de denunciar el masclisme i els micromasclismes començant a construir una societat més igualitària. Cadascun dels relats del llibre és una denúncia i una declaració que ens encamina cap a un món millor

    Aprendizaje de transductores estocásticos de estados finitos y su aplicación en traducción automática

    Full text link
    Traducción automática es un área de lingüística computacional que investiga el uso de software para traducir texto o voz en lenguaje natural hacia su representación en un idioma destino, también mediante lenguaje natural. En las últimas décadas ha habido un fuerte impulso sobre la utilización de técnicas estadísticas para el desarrollo de sistemas de traducción automática. Para la aplicación de estos métodos sobre un par de lenguas en concreto, se requiere la disponibilidad de un corpus paralelo para dicho par de idiomas. El atractivo de estas técnicas radica en que el desarrollo de un sistema se realiza sin necesidad de trabajo experto por parte de especialistas en lingüística. Los modelos de estados finitos llevan bastante tiempo empleándose con éxito en múltiples y variadas disciplinas dentro de la investigación científica aplicada al lenguaje natural, incluyendo su uso en traducción automática. Los modelos de estados finitos presentan una serie de ventajas con respecto a otros modelos estadísticos, como su sencilla integración en entornos de reconocimiento de voz, su aplicación en sistemas de traducción asistida, o su capacidad para procesar la información sin necesidad de que esté completa, por medio de una arquitectura basada en las populares cadenas de montaje. El objetivo de la investigación consiste en el estudio y la explotación de las técnicas de traducción automática basadas en modelos de estados finitos. El trabajo presentado en esta tesis es un análisis detallado de la metodología GIATI para el aprendizaje de transductores estocásticos de estados finitos para su aplicación eficaz y eficiente como modelos en traducción automática, permitiendo su uso sobre tareas de traducción con un gran volumen de datos.González Mollá, J. (2009). Aprendizaje de transductores estocásticos de estados finitos y su aplicación en traducción automática [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/6289Palanci

    Control de la evolución de un robot móvil con visión estereoscópica

    Get PDF
    El ser humano siempre ha estado buscando la forma de crear seres a su semejanza y que puedan realizar los trabajos más complicados o las tareas cotidianas. Con el paso del tiempo se ha conseguido que dichos seres, conocidos como robots, sean fácilmente utilizables e incluso programables, consiguiendo así, que un mismo robot se adapte a distintas necesidades. Hoy en día, tenemos en el mercado multitud de robots que podemos adquirir y programar para adaptar a nuestras tareas. Sin embargo, la forma de programarlos no es accesible a todos, hay que tener una preparación adecuada para entender cómo realizar la programación correctamente. El objetivo de este proyecto es crear un lenguaje fácil e intuitivo para controlar el robot Surveyor SRV-1 con comandos simples que controlen los movimientos del robot e interactúen con el usuario o con el entorno. Los programas de dicho lenguaje serán escritos en archivos XML debido a que existen abundantes recursos para procesar el lenguaje. Además, hemos desarrollado un simulador del robot para ver el comportamiento de los programas escritos sin necesidad de disponer del robot real. Para ambos casos, simulador o robot real, se podrán escribir programas editando archivos XML o mediante una interfaz que requiere mínimos conocimientos informáticos. En esta memoria se describirá tanto la especificación del lenguaje como las diferentes tecnologías integradas en el proyecto, tales como la visión estereoscópica o la comunicación con el robot mediante voz. [ABSTRACT] The human beings have always been looking for many ways to create beings in his likeness that were able to perform the most difficult jobs or the daily tasks. Overtime humanity has made these things, known as robots, easily usable and even programmable, to the point that the robot itself adapts to different needs. Today there are a lot of different robots we can acquire and schedule to adapt to our needs. However not everyone knows how to program them, people must be prepared to understand how to make it correctly. The purpose of this project is to create an easy and intuitive language in order to control the Surveyor SRV-1. This language has simple commands which control the movements of the robot and interact with the user or the environment. The programs will be written in XML files because there are a lot of resources to process these languages. We have also developed a simulator to observe the behavior of the programs without the real robot. In both cases, simulated or real robot, we can write programs editing XML files or through an interface that requires minimal computer knowledge. In this document we are going to describe the specification of the language and different technologies integrated in this project such as the stereoscopic vision or voice communication with the robot

    Aperture-corrected spectroscopic type Ia supernova host galaxy properties

    No full text
    International audienceWe use type Ia supernova (SN Ia) data obtained by the Sloan Digital Sky Survey-II Supernova Survey (SDSS-II SNS) in combination with the publicly available SDSS DR16 fiber spectroscopy of supernova (SN) host galaxies to correlate SN Ia light-curve parameters and Hubble residuals with several host galaxy properties. Fixed-aperture fiber spectroscopy suffers from aperture effects: the fraction of the galaxy covered by the fiber varies depending on its projected size on the sky, and thus measured properties are not representative of the whole galaxy. The advent of integral field spectroscopy has provided a way to correct the missing light, by studying how these galaxy parameters change with the aperture size. Here we study how the standard SN host galaxy relations change once global host galaxy parameters are corrected for aperture effects. We recover previous trends on SN Hubble residuals with host galaxy properties, but we find that discarding objects with poor fiber coverage instead of correcting for aperture loss introduces biases into the sample that affect SN host galaxy relations. The net effect of applying the commonly used g-band fraction criterion is that intrinsically faint SNe Ia in high-mass galaxies are discarded, thus artificially increasing the height of the mass step by 0.02 mag and its significance. Current and next-generation fixed-aperture fiber-spectroscopy surveys, such as OzDES, DESI, or TiDES with 4MOST, that aim to study SN and galaxy correlations must consider, and correct for, these effects.Key words: dark energy / galaxies: star formation / techniques: spectroscopic / supernovae: general / galaxies: abundances⋆ Full Table D.1 is only available at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/cat/J/A+A/659/A8

    Entorno colaborativo de desarrollo de proyectos fin de grado en titulaciones tecnológicas

    No full text
    La gestión y seguimiento del desarrollo de proyectos o trabajos fin de grado TFG requiere de una gran cantidad de tiempo y suele realizarse de manera individual sin colaboración ni retroalimentación entre alumnos. Debido al desarrollo y presentación individual se ha perdido el espíritu colaborativo y de trabajo en equipo en una asignatura tan relevante. Los objetivos concretos del trabajo son: 1. Diseñar una metodología colaborativa y ágil para el seguimiento de los TFG, 2. Proponer y evaluar herramientas colaborativas en tareas concretas del TFG y 3. Comparar la nueva metodología con metodologías convencionales. Se utilizarán cuestionarios a los alumnos tutelados por miembros de la red como instrumentos para la recogida de información. En la fase de diseño de la experiencia se plantearán las herramientas y metodología a seguir con el grupo de alumnos tutelados. En concreto se plantea un enfoque innovador en la gestión de los estudiantes de TFG mediante el uso de herramientas colaborativas. Se van a diseñar una serie de materiales y recursos disponibles para los estudiantes

    EMPEZAR A PROGRAMAR USANDO JAVA

    Full text link
    Este libro es una introducción al diseño metodológico de programas en la que se incide en el uso de los tipos de datos que dichos programas manipulan para representar el dominio de los problemas que resuelven.En concreto, la aproximación al diseño de programas seguida en este libro es la denominada Programación Orientada a Objetos,usa Java como lenguaje vehicular, incluye los tópicos habituales de un curso de programación a pequeña escala y hace de la eficiencia el criterio último de diseño de programas y tipos de datos.Aunque este libro va dirigido principalmente a estudiantes de primer curso del nuevo Grado en Informática,también puede resultar de utilidad en otros estudios universitarios o, incluso,en aquellos ámbitos académicos e industriales donde una buena fundamentación en la construcción y análisis de programas es necesariaLlorens Agost, ML.; Gómez Adrian, JA.; Galiano Ronda, IR.; Herrero Cuco, C.; Marqués Hernández, F.; Casanova Faus, A.; González Mollá, J.... (2016). EMPEZAR A PROGRAMAR USANDO JAVA. Editorial Universitat Politècnica de València. http://hdl.handle.net/10251/70965EDITORIA

    First scientific observations with MEGARA at GTC

    Get PDF
    On June 25th 2017, the new intermediate-resolution optical IFU and MOS of the 10.4-m GTC had its first light. As part of the tests carried out to verify the performance of the instrument in its two modes (IFU and MOS) and 18 spectral setups (identical number of VPHs with resolutions R=6000-20000 from 0.36 to 1 micron) a number of astronomical objects were observed. These observations show that MEGARA@GTC is called to fill a niche of high-throughput, intermediateresolution IFU and MOS observations of extremely-faint narrow-lined objects. Lyman-α absorbers, star-forming dwarfs or even weak absorptions in stellar spectra in our Galaxy or in the Local Group can now be explored to a new level. Thus, the versatility of MEGARA in terms of observing modes and spectral resolution and coverage will allow GTC to go beyond current observational limits in either depth or precision for all these objects. The results to be presented in this talk clearly demonstrate the potential of MEGARA in this regard
    corecore